Multi-Objective POMDPs with Lexicographic Reward Preferences

نویسندگان

  • Kyle Hollins Wray
  • Shlomo Zilberstein
چکیده

We propose a model, Lexicographic Partially Observable Markov Decision Process (LPOMDP), which extends POMDPs with lexicographic preferences over multiple value functions. It allows for slack–slightly less-than-optimal values–for higherpriority preferences to facilitate improvement in lower-priority value functions. Many real life situations are naturally captured by LPOMDPs with slack. We consider a semi-autonomous driving scenario in which time spent on the road is minimized, while maximizing time spent driving autonomously. We propose two solutions to LPOMDPs–Lexicographic Value Iteration (LVI) and Lexicographic Point-Based Value Iteration (LPBVI), establishing convergence results and correctness within strong slack bounds. We test the algorithms using real-world road data provided by Open Street Map (OSM) within 10 major cities. Finally, we present GPU-based optimizations for point-based solvers, demonstrating that their application enables us to quickly solve vastly larger LPOMDPs and other variations of POMDPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexicographic goal programming approach for portfolio optimization

This paper will investigate the optimum portfolio for an investor, taking into account 5 criteria. The mean variance model of portfolio optimization that was introduced by Markowitz includes two objective functions; these two criteria, risk and return do not encompass all of the information about investment; information like annual dividends, S&P star ranking and return in later years which is ...

متن کامل

Multi-Objective MDPs with Conditional Lexicographic Reward Preferences

Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semiautonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for...

متن کامل

Solving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective

In most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards. However in many applications this criterion might not be suitable from two perspective: first, in risk aversion situation expectation of accumulated rewards is not robust enough, this is the case when distribution of accumulated reward is heavily skewed; anot...

متن کامل

Dynamic Lexicographic Approach for Heuristic Multi-objective Optimization

This paper proposes a dynamic lexicographic approach to tackle multi-objective optimization problems. In this method, the ordering of objectives, which reflects their relative preferences, is changed in a dynamic fashion during the search. This approach eliminates the need for the decision-maker to establish fixed preferences among the competing objectives, which is often difficult. At the same...

متن کامل

Compendious Lexicographic Method for Multi-objective Optimization

A modification of the standard lexicographic method, used for linear multiobjective optimization problems, is presented. An algorithm for solving these kind of problems is developed, for the cases of two and three unknowns. The algorithm uses the general idea of indicating the lexicographic order to objective functions, combined with the graphical method of linear programming. Implementation de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015